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Summary  / 

^  A  Rewrite  System  is  a  collection  of  rewrite  rules  of 
the  form  <*-*($  where  (it  and  3  are  tree  patterns.  A  rewrite 
system  can  be  extended  by  associating  a  cost  with  each 
rewrite  rule,  and  by  defining  the  cost  of  a  rewrite  sequence 
as  the  sum  of  the  costs  of  all  the  rewrite  rules  in  the 
sequence.  The  reachability  problem  for  a  rewrite  sys¬ 
tem  R  is,  given  an  input  tree  T  and  a  fixed  goal  tree  G ,  to 
determine  if  there  exists  a  rewrite  sequence  in  R,  rewriting 
T  into  G  and,  if  so,  to  obtain  one  such  sequence.  The  C- 
REACHABIUTY  problem  is  similar  except  that  the  obtained 
sequence  must  have  minimal  cost  among  all  those 
sequences' rewriting  T  into  G  .  -y 

-This  paper  introduces  a  class  of  rewrite  systems 
called  Bottom-Up  Rewrite  Systems  (BURS),  and  a  table- 
driven  algorithm  to  solve  reachability  for  members  of 
the  class.  This  algorithm  is  then  modified  to  solve  C- 
reach ability  and  specialized  for  a  subclass  of  BURS  so 
that  all  cost  manipulation  is  encoded  into  the  tables  and  is 
not  performed  explicitly  at  solving  time.  The  subclass 
extends  the  simple  machine  grammarr*{AGtf$4) ,  rewrite 
systems  used  to  describe  target  machine  architectures  for 
code  generation,  by  allowing  additional  types  of  rewrite 
rules  such  as  commutativity  transformations^ 

1 A  table-driven  code  generator  based  on  solving  c- 
reachabhjty  has  been  implemented  and  tested  with 
several  machine  descriptions.  The  code  generator  solves 
c-reachabiuty  faster  than  a  comparable  solver  based  on 
Graham-Glanville  techniques  [AGH841  (a  non-optimal 
technique),  yet  requires  only  slightly  larger  tables.  The 

t  Thu  retearch  wit  partially  sponsored  by  Defense  Advance 
Research  Projects  Agency  (DoD)  Arpa  Order  No.  <871,  monitored  by  Na¬ 
val  Electronic  Systems  Command  under  Contract  No.  N00039-84-C- 
0089. 


code  generator  runs  much  faster  than  recent  proposals  to 
solve  C -REACHABILITY  that  use  pattern  matching  and  deal 
with  costs  explicitly  at  solving  time 
[AGT86,  HeD87,WeW86).  The  BURS  theory  generalizes 
and  unifies  the  bottom-up  approaches  of  Henry/Damron 
[HeD87]  and  Weisgerber/Wilhelm  [WeW86]. 

1.  Introduction 

Trees  are  convenient  representations  for  many  appli¬ 
cations  because  of  their  hierarchical  structure  and  the  ease 
with  which  they  can  be  manipulated.  Frequently  this 
manipulation  corresponds  to  transformations  between  dif¬ 
ferent  tree  representations.  In  this  paper  we  study  a 
mechanism  to  describe  tree  transformations  and  rewrite 
systems,  together  with  a  specific  tree  transformation  prob¬ 
lem,  reachability,  and  its  application  to  the  generation  of 
optimal  code  for  expression  trees. 

In  this  paper,  trees  are  denoted  either  either  by 
graphs  (as  in  Figure  1.1)  or  by  a  prefix  linearization.  For 
example,  op(Tx,T 2)  denotes  the  tree  with  root  op  and  sub¬ 
trees  T 1  and  T 2.  The  node  labels  are  taken  from  an  alpha¬ 
bet  Op  of  operators  and  all  operators  are  assumed  to  have 
fixed  arity.  Patterns  are  trees  over  an  alphabet  that  has 
been  extended  with  new  symbols  with  arity  0  called  vori- 
ables.  In  the  examples,  variables  are  represented  by  X,  Y , 
or  Xi  (i  SO),  and  all  other  symbols  stand  for  operators.  If 
a  is  a  value  assignment  for  variables  present  in  a  pattern  p, 
c(p)  denotes  the  replacement  of  the  variables  by  the  values 
associated  by  a.  p  matches  at  a  tree  T  if  there  is  an 
assi&nment  ot  values  to  the  variables  in  the  pattern,  o,  such 
that  o(p)  is  T.  Thus,  the  pattern  +{X  ,Y)  matches  at  any 
tree  rooted  with  +  and  having  two  subtrees,  corresponding 
to  X  and  Y  respectively.  Two  patterns  pi  and  p2  are  said 
to  be  equivalent  if  they  are  identical  up  to  a  systematic 


renaming  of  the  variables.  All  patterns  in  this  paper  are 
linear,  i.e.,  every  variable  appears  at  most  once.  A  rewrite 
rule  is  of  the  form  a-*{3,  where  a  and  0  arc  patterns-,  a  is 
called  the  input  pattern  and  J3  the  output  pattern,  and  all 
the  variables  in  (3  must  appear  in  a.  A  rewrite  system  is 
just  a  collection  of  rewrite  rules.  Figure  1.1  shows  a 
rewrite  system  that  we  will  use  in  our  examples. 

A  position  in  a  tree  is  a  sequence  of  integers 
(separated  by  •  for  readability)  representing  a  “path” 
from  the  root  of  the  tree  to  a  node  in  the  tree.  If  p  is  a 
position  in  T,  the  subtree  of  T  rooted  at  p  is  denoted  by 
Tqp.  The  root  position  in  a  tree  is  designated  by  the  empty 
sequence  e;  each  integer  corresponds  to  an  index  from  left 
to  right  commencing  with  1.  If  k  •  s  is  a  sequence  with 
head  an  integer  k  and  tail  a  sequence  s ,  and  =  is  read  as 
“is  defined  as”,  positions  and  subtrees  are  related  as  fol¬ 
lows: 


^  J* 

op  (^i  *  •  •  * » =  (7*  ,  if  l<Jc  <ji  . 

A  rewrite  rule  r:a-»{i  is  applicable  to  a  tree  T  at  a 
position  p  if  a  matches  at  Tqp  .  If  r  is  applicable  to  7*  at  p 
with  variable  assignment  o,  the  application  of  r  to  T  at  p 
is  a  new  tree,  identical  to  T  except  that  the  subtree  T@p  is 
replaced  by  c((3).  A  rewrite  sequence  is  just  a  sequence  of 
applications  of  rewrite  rules: 

Definition  1.1  A  rewrite  application  for  a  rewrite  system 
R  is  a  pair  <r  j)>  where  r  is  a  rule  in  R  and  p  is  a  posi¬ 
tion.  A  rewrite  sequence  for  R  is  a  sequence  x  of  rewrite 
applications.  If  t=<r0^o>  ' • '  <rmp>m>  is  a  rewrite 
sequence,  then  t  is  applicable  to  a  tree  T  if  r0  is  applica¬ 
ble  to  Tqp%  and  its  application  yields  and  for  l£j  <n ,  r, 
is  applicable  to  (Tj)@pi  and  its  application  is  T1+1.  The 
application  of  x  to  T  is  denoted  x(T)  and  is  7«+I.  We  say 
that  a  rewrite  sequence  is  valid  if  there  is  some  tree  to 
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Figure  1.2 


which  it  is  applicable.  The  length  of  a  rewrite  sequence  is 
the  number  of  rewrite  applications  in  it.  The  composition 
of  a  rewrite  sequence  is  a  rewrite  rule  (possibly  not  in  R ) 
that  is  applicable  whenever  the  rewrite  sequence  is  appli¬ 
cable  and  always  yields  the  same  result  as  the  sequence. 

If  z  is  a  rewrite  sequence  for  T  such  that  all  the 
applications  in  z  have  positions  below  p  (that  is,  position 
p  is  an  initial  sequence  of  all  the  application  positions), 
the  restriction  of  z  to  p,  z@p ,  is  the  sequence  of  applica¬ 
tions  identical  to  z  except  that  every  position  is  stripped  of 
the  initial  sequence  corresponding  to  p . 

Not  every  rewrite  sequence  has  a  composition.  A 
rewrite  system  R  defines  transformations  between  sets  of 
trees  through  its  rewrite  sequences:  a  tree  T  can  be 
mapped  into  a  tree  T  if  there  exists  a  rewrite  sequence  in 
R  taking  T  into  T .  The  transformation  is,  in  general, 
many-to-many. 

The  problems  studied  in  this  paper  are  the  following: 

Problem  REACHABILITY  Let  R  be  a  rewrite  system  over 
an  alphabet  Op,  and  let  L,  and  L„  be  two  sets  of  trees  over 
Op.  The  reachability  problem  for  R,  L, ,  and  L0  is,  given 
any  T  e  L,  and  any  T  e  L„,  determine  if  there  is  a 
rewrite  sequence  T  for  R  applicable  to  T  such  that 
Z(T)m  T ,  and,  if  so,  to  produce  one  such  sequence. 

If  L0  is  a  singleton  {G },  then  the  reachability 
problem  is  called  the  fixed  goal  REACHABILITY  problem, 
and  G  is  catted  the  goal.  The  BLOCKING  problem  for  R, 
L, ,  and  goal  G  is  to  determine  if  there  exists  a  tree  T  e  L, 
that  cannot  be  rewritten  into  G  by  R. 


This  paper  is  only  concerned  with  the  fixed-goal 
version  of  REACHABILITY,  and  with  BLOCKING;  see  [Pel87] 
for  some  considerations  on  general  reachability.  In  our 
example,  given  the  input  tree  +(0 ,+(Const , Const)),  one 
solution  to  REACHABILITY  for  goal  reg  is  the  sequence  of 
Figure  1.2.  If  Lt  consists  only  of  trees  with  labels  Reg , 
Const,  0,  +,  and  the  rewrite  system  of  Figure  1.1,  never 
blocks. 

A  rewrite  system  can  be  extended  by  assigning  a 
cost  to  each  rewrite  rule.  The  cost  of  a  rewrite  sequence 
for  an  extended  rewrite  system  can  then  be  defined  as  the 
sum  of  the  costs  of  all  the  rewrite  rules  in  the  sequence. 
This  leads  to  a  variation  of  the  reachability  problem, 
called  C-reachability,  where  the  objective  is  not  only  to 
provide  a  rewrite  sequence  but  to  provide  one  with 
minimum  cost. 

Returning  to  our  example,  if  the  cost  of  each  of  the 
rewrite  rules  of  Figure  1.1  were  defined  to  be  1,  then  the 
cost  of  the  sequence  of  Figure  1.2  would  be  7.  That 
rewrite  sequence  is  not  a  solution  to  C-REACHABIUTY;  the 
smallest  possible  cost  is  6,  and  may  be  obtained  by  the 
sequence  of  Figure  1.3. 


The  rewrite  system  of  Figure  1.1  is  chosen  so  that 
applicable  rewrite  sequences  correspond  to  instructions  for 
some  (hypothetical)  target  machine1.  Hence,  the  rewrite 
sequences  in  Figure  1.2  and  Figure  1.3  correspond  to 
instruction  sequences  for  the  given  input  tree  and  for  the 
target  machine.  If  the  rewrite  system  accurately  describes 
the  target  machine,  a  solution  to  REACHABILITY  provides  a 

1  r  i  might  be  used  to  generate  •  rcgisier-io-register  move  if  the  in¬ 
put  register  could  not  be  modified. 
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correct  sequence  of  instructions  for  the  input  tree.  If,  in 
addition,  the  costs  of  the  rewrite  rules  correctly  represent 
the  desired  properties  of  the  target  machine,  a  solution  to 
C-REaCHabujty  provides  a  locally  optimal  instruction 
sequence,  blocking  corresponds  to  detecting  the 
existence  of  an  input  tree  for  which  code  cannot  be  gen¬ 
erated.  Examples  of  typical  cost  metrics  are  the  number  of 
cycles,  and  the  number  of  bytes  referenced.  In  the  pres¬ 
ence  of  features  like  pipelines  and  caches,  the  number  of 
cycles  will  be  only  a  static  approximation  of  the  execution 
costs.  Previous  research  (notably  [GrH84,  Hen84])  has 
shown  how  to  write  target  machine  descriptions  using  a 
variety  of  techniques,  and  allows  us  to  conclude  that 
although  some  features,  like  constraints  on  the  number  of 
registers,  must  be  handled  outside  the  framework  of  C- 
reachabujty,  an  efficient  algorithm  to  solve  c- 
reachabiuty  can  be  used  to  provide  an  efficient  algo¬ 
rithm  for  locally  optimal  code  generation. 

The  rest  of  this  paper  is  organized  as  follows.  Sec¬ 
tion  2  shows  how  to  solve  REACHABILITY  for  a  special 
class  of  rewrite  systems,  then  Section  3  modifies  the  tech¬ 
niques  and  applies  them  to  solving  C-REACHABUJTY. 
Finally,  Section  4  discusses  a  code  generator  generator 
implemented  following  the  theory  of  the  previous  sections. 
The  paper  concludes  with  a  discussion  of  related  work. 

2.  Solving  REACHABILITY 

reachability  is  solved  by  characterizing  all  the 
possible  rewrite  sequences  with  a  bottom-up  tree  automa¬ 
ton  [Tha73].  We  use  two  notions  of  state:  local  rewrite 
graphs  (LR  graphs)  and  uniquely  invertible  local  rewrite 


graphs  (UI  LR  graphs).  Without  loss  of  generality,  we 
assume  that  the  goal  tree  for  fixed-goal  REACHABILITY  is  a 
leaf  labeled  with  a  distinguished  nullary  operator  which 
appears  only  as  an  output  pattern  in  a  rewrite  rule. 

The  first  step  in  defining  LR  graphs  is  to  restrict 
attention  to  rewrite  sequences  in  a  normal  form  that 
rewrites  the  input  tree  bottom-up.  A  rewrite  sequence  can 
be  put  in  normal  form  by  “reordering”  the  rewrite  appli¬ 
cations. 

Definition  2.1  Let  r0  and  r,  be  two  rewrite  rules  in  R, 
and  let  x*<r0lpo><ri./’i>  be  a  valid  rewrite  sequence.  An 
exchange  of  the  two  applications  is  a  new  rewrite 
sequence  ?'  of  the  form  <ri1p2><ro>P3>  sucfl  that  for  all 
T.t  is  applicable  to  T  if  and  only  if  x’  is  applicable  to  T, 
and  when  so,  x(T)*z\T). 

If  Tj  and  x2  ore  two  rewrite  sequences  in  R,  Xj  is  a 
permutation  of  x2  if  xt  can  be  obtained  from  x2  through  a 
sequence  of  exchanges. 

A  rewrite  sequence  Xj  is  said  to  “loop”  if  it  contains 
a  proper  prefix  subsequence  X2  such  that,  for  some  tree  T , 
X](T)sX2(T).  All  non-looping  rewrite  sequences  can  be 
reordered  so  that  they  proceed  in  a  “bottom-up  fashion", 
namely,  so  that  any  rewrite  application  at  a  position  p  is 
preceded  by  all  rewrite  applications  having  positions 
below  p  which  can  be  reordered  in  that  way.  For  example, 
the  sequence  of  Figure  1.2  can  be  placed  into  bottom-up 
form  by  reordering  the  subsequence  <rjo,e><r7,l><r5,l> 
as  <r7.lxr5.lxr  10,e>.  The  rewrite  sequence  of  Figure 
1.3  is  already  in  bottom-up  form.  The  notion  is  formalized 
as  follows: 


-v  1  *>. 


Definition  2.2  Let  TmopfJ . . T„)  be  a  tree  in  some 

input  set  L,  let  X  be  a  rewrite  sequence  without  loops 
transforming  T  into  a  tree  T.  x  is  in  normal  form  at  T,  if 
it  is  of  the  form  Ti  •  •  •  x„x<),  and,  (1)  for  l£r£n,  x,  only 
contains  applications  in  positions  in  the  subtree  Ti ,  and 
the  restriction  of  X,  to  the  position  i,  (x,  )9,  (7 t ),  is  T\;  (2) 

Xo  applied  to  op(T . . 7*.)  yields  the  output  tree  T ; 

and  (3)  there  is  no  permutation  of  X  satisfying  (1)  and  (2) 
and  in  which  some  rewrites  from  Xo  have  been  moved  into 
Xj  for  l£i£n.  x  is  in  normal  form  (everywhere)  if  it  is  in 
normal  form  at  T ,  and,  (4)  for  l£t£n ,  (x,  )@1  is  in  normal 
form. 

A  normal  form  rewrite  sequence  for  an  input  tree 
assigns  to  each  position  in  the  tree  a  “local"  rewrite 
sequence:  those  rewrites  done  at  that  position.  Formally. 

Definition  2-3  Let  X  be  a  normal  form  rewrite  sequence 
for  T  of  the  form  Xj  ■  •  •  x„Xo.  The  local  rewrite  sequence 
assigned  by  x  to  a  position  p  in  T  is  defined  by  F(T ,x,p), 
where  (1)  F(T,x,t)  is  Xq.  and  (2)  if  p  is  of  the  form  i  »q 

and  T  is  of  the  form  op(J\ . T„),  then  F(T,x,p)  is 

F(TitXi,q).  The  local  rewrite  assignment  of  X  and  T  is 
the  function  assigning  to  each  position  in  T  its  local 
rewrite  sequence. 

For  example,  the  local  rewrite  sequence  assigned  by 
the  rewrite  sequence  of  Figure  1.3  at  the  root  of  the  input 
tree  is  <r9,Exrt,txr ,  ,e>. 

We  can  now  define  the  k  -BURS  and  BURS  proper¬ 
ties. 

Definition  2.4  Let  k  be  a  positive  integer,  and  let  X  be  a 
rewrite  sequence  in  normal  form  for  some  input  tree  T .  X 
is  in  k-normal  form  if  it  is  in  normal  form  and  each  of  the 
local  rewrite  sequences  assigned  by  x  to  the  nodes  ofTis 
of  length  at  most  k . 

Let  R  be  a  rewrite  system  over  Op,  let  t,  and  L0  be 
sets  of  trees  over  Op,  and  let  k  be  a  positive  integer.  The 
triple  <RJL,  La  >  is  said  to  have  the  k-BURS  property  if 
for  any  two  trees  T  e  L,  and  T  e  L„  and  any  sequence  x 
in  R,  with  x(TfrT ,  there  is  a  permutation  of  x  which  is  in 
k  -normal  form.  The  class  BURS  is  composed  of  those  tri¬ 
ples  <RJL,  Le  >  satisfying  the  k-BURS  property  for  some 
positive  integer  k . 

Since  we  are  considering  only  fixed-goal  problems 
in  this  paper,  Lt  is  normally  understood  to  be  { goal  J.  If 


Li  is  not  specified,  it  is  understood  to  be  the  set  of  all  trees 
over  the  given  set  of  symbols.  Op,  which  we  denote  as 

w 

There  are  rewrite  systems  and  sets  of  trees  not  in 
BURS.  The  rewrite  system  of  Figure  2.1  with  goal  d  is 
one  example:  the  local  rewrite  sequence  of  the  (unique) 
normal  form  rewrite  sequence  rewriting  the  input  tree 
a(b(b(  -  ■  ■  b(c ))))  into  d  has  length  dependent  on  the 
height  of  the  input  tree.  In  contrast,  the  example  of  Figure 
1.1  with  goal  reg  satisfies  the  BURS  property  for  1=3. 

Testing  membership  in  A -BURS  is  easy  when  both 
L,  and  Lq  are  l—op* 

Proposition  2.1  Let  R  be  a  rewrite  system  over  a  set  of 
operators  Op,  and  let  k  be  a  positive  integer.  There  is  an 
algorithm  that  will  determine  whether  KRJLorJ-or*  is  in 
k-BURS. 

Proof 

We  can  characterize  the  form  of  any  local  rewrite 
sequence  at  some  position  p  in  a  tree.  The  first  observa¬ 
tion  is  that  it  must  start  with  a  rewrite  application  at  posi¬ 
tion  p  because,  otherwise,  this  first  rewrite  application 
would  be  assigned  to  the  local  rewrite  sequence  of  a  posi¬ 
tion  below  p.  The  second  observation  is  similar  but 
requires  an  additional  notion.  If  p  is  a  pattern,  let  T(p) 
denote  the  set  of  positions  in  p  that  do  not  correspond  to  a 
variable;  we  call  these  the  positions  "touched”  by  the  pat¬ 
tern.  If  <rj»  is  a  rewrite  application  and  r  is  a->p,  we 
define  T (<rp>)  to  be  the  union  of  those  positions  of  the 
form  p  .  q  where  q  is  in  T  (a)u T (P).  Finally,  if  x  is  a 
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rewrite  sequence,  define  T(x)  as  the  union  all  the  sets 
T(<rp>)  for  rewrite  plications  <rj>>  in  t.  The 
second  observation  now  states  that,  if  to  is  a  local  rewrite 
sequence  then,  for  any  prefix  of  to  of  the  form  t  •<rj>>, 
the  position  p  must  be  in  T( x).  As  before,  die  reason  is 
that  otherwise  <rp>  could  be  moved  ahead  and  would 
belong  to  the  local  rewrite  sequence  of  a  position  below  p . 

Rom  the  characterization,  it  follows  that  there  are  a 
finite  number  of  local  rewrite  sequences  of  length  no  larger 
than  k ,  and  they  can  be  generated.  Given  a  rewrite  system 
R,  <RL0fL0f>  is  in  k  -BURS  if  and  only  if  there  is  not  a 
rewrite  sequence  of  length  k+ 1,  which  can  be  found  by 
generating  and  testing  all  the  candidates.  □ 

It  follows  from  the  characterization  used  in  the  proof 
that  every  local  rewrite  sequence  has  a  composition  (which 
may  not  be  in  R). 

The  rewrite  systems  used  to  describe  target 
machines  are  BURS. 

Definition  2.5  Let  a-* 0  be  a  rewrite  rule.  We  say  that 
the  rule  is:  an  instruction  fragment  rule  if  a  is  a  tree 
without  variables  and  $  is  a  (0 -ary)  symbol;  a  generic 

operator  rule  if  a  and  P  are  op(Xx . X%)  and 

op'QC . . X„).  for  some  n-ary  symbols  op  and  op' ;  a 

commutativity  rule  if  a  and  P  are  op(Xx . XK)  and 

op(X% (i) . X «(„)),  for  some  n-ary  operator  op  and 

some  permutation  it;  and  an  identity  rule  if  a  and  P  are 
op(X  J)  and  X ,  for  some  tree  T  that  has  no  variables. 

A  simple  machine  grammar  is  a  rewrite  system  with 
only  instruction  fragment  and  generic  operator  rewrites2. 

In  Figure  1.1  rules  r  j  to  r7  are  instruction  fragment 
rules,  and  rules  r10  and  rn  are  generic  operator  rules. 
Rule  r9  is  a  commutativity  rule,  while  rule  rg  is  a  identity 
rule.  The  proof  of  Proposition  2.1  can  be  used  to  show: 

Proposition  2.2  Simple  machine  grammars  are  in  BURS. 
Machine  grammars  extended  with  commutativity  and  iden¬ 
tity  rules  are  in  BURS. 

A  local  rewrite  assignment  provides  a  decomposi¬ 
tion  of  the  original  rewrite  sequence:  the  concatenation  of 
the  local  rewrite  sequences  of  the  input  tree  in  post-order 
traversal  order  yields  a  permutation  of  the  original  rewrite 

*  For  example  thote  a  ted  by  Hairy  in  (Hai84).  Henry  handle* 
commutativity  explicitly  by  adding  pattemt  Identity  rule*  are  recognized 
by  a  peephole  optimizer  or  prior  to  innivctioo  telectioc. 


sequence.  This  decomposition  can  be  used  to  define  our 
first  nouon  of  a  state  for  solving  reachability.  The  local 
rewrite  graph  (LR  graph)  of  a  tree  T  represents  the  local 
rewrite  sequences  of  all  normal  form  rewrite  sequences 
applicable  to  T.  For  a  rewrite  system  R  and  a  goal  G ,  we 
consider  two  sets  of  patterns:  Igj0  are  the  patterns  of 
interest  at  the  beginning  of  local  rewrite  sequences,  and 
Of  c  are  the  patterns  of  interest  at  the  end  of  the  local 
rewrite  sequences  and  are  used  to  construct  members  of  I 
higher  in  the  tree.  EFg  c  is  their  union. 

Definition  2.6  IfR  is  a  rewrite  system,  and  G  is  the  fixed 
goal,  the  extended  pattern  set  of  R  and  G ,  EFg  is  the 
union  of  the  sets  tgjo  (the  inputs),  and  0Kja  (the  outputs), 
defined  constructively  below. 

(1)  G  belongs  to  Og jo- 

(2)  For  some  input  tree  T,  position  p ,  and  some  normal 
form  rewrite  sequence,  let  X  be  a  local  rewrite 
sequence  with  composition  a*-»  P*.  Let  p  be  a  pat¬ 
tern  in  0K  fi  with  variables  renamed,  if  necessary,  to 
be  distinct  from  those  in  Pt.  If  there  is  a  substitution 
a  such  that  a(Pt) = a(p),  then  o(Pt)  is  in  0KJG,  o(  a*) 
is  in  Igja,  and  all  the  proper  subtrees  of  0(0*) 
belong  to  Og  jo . 

Now  we  can  define  LR -graphs. 

Definition  2.7  Let  R  be  a  rewrite  system  in  BURS.  The 
LR  graph  associated  with  a  tree  T  is  a  graph  G  =(VJi) 
defined  as  follows. 

Let  A  be  the  set  of  pairs  <T * ,Xo>  such  that  there  is  a 
normal  form  rewrite  sequence  for  T  of  the  form 
tiTj  •  •  •  T.Xo  and  XH(  ■  •  •  T2(Ti(T))  ■■■)  is  Tm  and  x0  (T*) 
is  Tem.  For  every  local  rewrite  sequence  x  such  that  there 
is  a  <Tj,  ,x>  e  A ,  let  a,  -» P*  be  its  composition,  lfx  has  n 

rewrite  applications,  let  pre(x,  1) pre(x,n)mx  be  the 

prefix  subsequences  of  T.  Let  B  be  the  set  of  trees  of  the 
form  a(pt)  where,  for  some  Tt ,  <T,  ,t>  is  in  A ,  0(0*) 
matches  at  7) ,  and  o(P*) = o(p)  for  some  p  in  Og .  Finally, 
define  B'  as  a  set  of  representatives  of  B  under  the 
equivalence  relation  between  patterns. 

For  every  pair  <7U  ,T>  €  A  and  every  substitution  a 

with  o(P0  e  B ',  o(a*),  o(P^,(tiJ)) . o(P^ (*.«>)  are  nodes 

in  V  and  there  is  an  edge  in  E  between  each  successive 
pair  of  them.  There  are  no  other  nodes  inV  or  edges  in 
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77ie  nodfj  corresponding  to  c(Ot)  arc  ca/W  f/ic 
input  nodes,  and  those  corresponding  to  o(^)  (those  in 
B' )  are  called  the  output  nodes.  The  remaining  nodes, 
corresponding  to  <s($pr.(tj))  for  lS;<n ,  are  called  the 
intermediate  nodes. 


Note  that  the  union  of  all  the  input  nodes  in  all  the 
LR  graphs  gives  1k.g<  the  union  of  all  the  output  nodes 
gives  Okc>  and  the  union  of  the  input  and  the  output 
nodes  gives  EFKC .  Figure  2.2  shows  an  input  tree  and  its 
associated  LR  graph  for  our  rewrite  system.  The  input 
trees  are  shown  inside  broken  circles,  while  the  output 
trees  are  in  complete  circles.  The  goal  is  reg . 


The  notion  of  an  LR  graph  leads  to  the  following 
procedure  for  solving  REACHABILITY  for  rewrite  systems  in 
k-BURS: 


1 


Compute  the  LR  graphs  of  all  the  subtrees  of  the 
input  tree  T. 

2  If  the  goal  G  does  not  appear  in  the  LR  graph  of  7\ 
then  there  is  no  rewrite  sequence  from  T  into  G ; 
i.e.,  T  “blocks”  [G1G78].  If  G  does  appear,  assign 
to  each  position  of  T  a  local  rewrite  sequence  by 
applying  steps  3  to  5  recursively  starting  with 
being  T  and  Tout  being  G . 

3  Select  a  local  rewrite  sequence  for 

. . . T.)by  selecting  any  path  in  the  LR 

graph  corresponding  to  a  local  rewrite  sequence  t0 

from  some  input  tree  op(Tl . Tm)  into  an  output 

tree  p  such  that  there  is  a  substitution  o  with 
a(jp)=o(J0U). 

4  Recursively  apply  (3)  to  input  T ,  and  goal  T 
and  to  input  Tm  and  goal  Tm . 

5  Combine  all  the  local  rewrites  in  post-order  to  yield 
a  normal-form  rewrite  sequence  for  Tin  into  TM . 

The  procedure  is  non-deterministic  since  any  path 

can  be  chosen  in  Step  3.  The  first  step  in  the  procedure 


l*< |‘*^,fif-'  IJ  I 


8 


and  the  test  in  the  second  step  determine  whether  there  is 
an  appropriate  rewrite  sequence.  The  remainder  of  the 
procedure  produces  one  such  sequence. 

The  first  step  assumes  that  it  is  possible  to  compute 
the  LR  graphs.  An  important  special  case  when  this  is 
possible  is  when  the  extended  pattern  set  is  finite.  In  this 
case  the  collection  of  all  the  LR  graphs  that  may  be 
assigned  to  any  subtree  of  any  input  tree  is  finite.  We  call 
this  subclass  of  BURS  finite  BURS. 

A  semi-decision  procedure  for  membership  in  finite 
k  -BURS  in  the  case  that  the  input  set  Lt  is  Lor  first  tests 
A -BURS  and  generates  all  the  local  rewrite  sequences, 
using  the  characterization  used  in  the  proof  of  Proposition 
2.1,  and  then  tries  to  generate  the  extended  pattern  ret,  fol¬ 
lowing  Definition  2.6.  If  the  procedure  terminates,  then 
the  extended  pattern  set  can  be  used  to  obtain  the  collec¬ 
tion  of  all  the  LR  graphs. 

It  is  not  difficult  to  see  that  the  extended  pattern  set 
of  machine  grammars  as  defined  in  Definition  2.5  is  finite. 
Thus: 

Proposition  23  Every  simple  machine  grammar  is  finite 
BURS.  Machine  grammars  extended  with  commutativity 
and  identity  rewrites  are  also  finite  BURS. 

If  the  rewrite  system  is  finite  BURS,  then  each  one 
of  the  individual  steps  of  the  algorithm  for  solving  fixed 
REACHABILITY  can  be  precomputed,  stored  into  a  table, 
and  replaced,  at  REACHABiUTY-solving  time,  by  a  table 
lookup.  This  leads  to  a  typical  “table-generator  plus 
solver”  approach  to  REACHABILITY.  The  table  generator 
computes  all  the  possible  LR  graphs,  and  stores  their 
interactions  into  tables.  The  solver  then  consults  them.  By 
moving  computation  into  the  table  generator,  the  solver 
can  proceed  very  rapidly. 

If  the  rewrite  system  is  finite  BURS,  it  is  also  possi¬ 
ble  to  solve  blocking  efficiently.  If  the  input  set  is  Lor 
there  will  be  a  blocking  tree  if  there  is  an  LR  graph  that 
does  not  contain  the  goal  G  as  a  node.  If  the  rewrite  sys¬ 
tem  has  the  property  that  <RJLofJ-Of>  is  in  finite  BURS, 
and  S  is  a  recognizable  [Tha73]  subset  of  L0r  we  can  find 
the  LR  graphs  that  are  useful  for  trees  in  5,  and  we  can 
find  whether  there  is  a  tree  in  S  for  which  R  blocks.  Both 
problems  are  solved  by  constructing  the  bottom-up  tree 
automaton  recognizing  5  and  “running  it  against”  the 
automaton  computing  LR  graphs:  see  [Pel87]  for  details. 


In  general,  an  LR  graph  contains  more  than  one  path 
within  the  state  which  leads  to  an  output  tree.  But  it  is 
only  necessary  to  keep  one  alternative  for  solving  reacha¬ 
bility.  Consequently,  it  is  possible  to  use  the  same 
reachability  algorithm  and  to  replace  the  LR  graph  by 
any  subgraph  of  it  such  that  (i)  it  contains  all  the  output 
nodes,  (ii)  every  node  has  at  most  one  entering  edge,  and 
(iii)  for  every  output  node  there  is  at  least  one  directed 
path  with  all  its  nodes  in  the  subgraph  from  an  input  node 
to  the  output  node.  (Since  all  input  nodes  are  reachable, 
all  but  one  of  them  can  be  omitted).  Such  a  graph  is  called 
a  uniquely  invertible  local  graph  (U1  LR  graph),  and  is  the 
second  notion  of  state  used  to  solve  reachability. 

Since  the  same  UI  LR  graph  may  be  a  subgraph  of 
several  different  LR  graphs,  choosing  the  UI  LR  graphs 
carefully  may  allow  a  reduction  in  the  number  of  states 
needed.  For  example,  in  Figure  2.2  there  are  many  dif¬ 
ferent  ways  of  obtaining  reg ;  any  one  of  them  is  good 
enough.  If  the  path  starting  from  +{amode  jamode)  is 
selected,  this  state  could  also  be  used  for  many  other  trees 
including,  for  example,  +(0 Reg)  and  HReg, 0).  Unfor¬ 
tunately,  selecting  the  UI  LR  graphs  so  as  to  minimize  the 
total  number  required  is  a  complex  problem. 

Proposition  2.4  Given  a  rewrite  system  R  over  Op,  and  a 
set  of  trees  L„  over  Op,  with  <RJ*0p£. 0>e  finite  BURS, 
the  minimum  ui  LR  graph  problem  consists  of  assigning  to 
each  LR  graph  a  valid  UI  LR  graph  such  that  the  number 
of  UI  LR  graphs  used  is  minimum.  MINIMUM  UILR  GRAPH  is 
NP-complete. 

Proof  by  reduction  of  minimum  cover  [GaJ80];  quite 
straight-forward,  see  [Pel87],  □ 

The  selection  process  is  further  complicated  because 
selecting  some  paths  in  an  LR  graph  may  make  some  trees 
in  the  graph  “useless”  for  solving  reachability,  which 
may  open  new  opportunities  for  making  graphs  equivalent 
Section  4  below  describes  a  heuristic  used  to  select  the  UI 
LR  graphs,  as  well  as  the  table  representation  used  by  the 
solver.  Detection  of  useless  nodes  in  the  graphs  can  be 
done  by  a  simple  iteration  process. 

reachability  problems  can  be  used  in  several 
applications,  the  rest  of  this  paper  shows  how  to  modify 
the  algorithm  to  solve  the  c-reachabiuty  problem. 


3.  Solving  C-REACHABILITY 

A  first  approach  to  solving  c-REAchability  would 
be  to  enrich  the  notion  of  an  LR  graph  by  using  a  graph 
where  the  nodes  are  not  patterns  but  pairs  (pxost)  where 
cost  represents  the  minimum  cost  to  reach  pattern  p,  and 
where  the  edges  correspond  to  rewrite  applications  along 
paths  of  minimal  cost  Such  an  approach  works  correctly 
but  leads  to  an  unbounded  number  of  states  and  thus  to 
costly  solver-time  operations.  A  better  solution  is  to  store, 
instead  of  the  total  cost  needed  to  reach  p,  only  the  delta 
cost.  The  delta  cost  is  defined  by  substracting  from  the 
cost  associated  with  each  pattern,  the  smallest  cost  associ¬ 
ated  with  any  pattern  in  the  LR  graph.  Since  the  cost  of  a 
sequence  is  the  sum  of  the  costs  of  all  the  rewrites  in  the 
sequence,  choosing  a  rewrite  sequence  based  in  the  delta 
cost  yields  the  same  solution  as  choosing  one  based  on  full 
costs,  yet  the  number  of  states  will  be  smaller.  The  result¬ 
ing  notion  is  called  a  b-LR  graph. 

The  delta  costs  can  be  computed  without  first  com¬ 
puting  the  full  minimal  cost  for  each  pattern.  The  delta 
costs  of  all  pauems  in  the  graph  can  be  computed  from  the 
delta  costs  of  the  input  nodes,  which  are  determined  by  the 
delta  costs  of  the  output  nodes  of  other  states. 

There  is  no  guarantee  that  a  rewrite  system  that  is  in 
finite-BURS  will,  when  extended  with  costs,  have  a  finite 
number  of  5-LR  graphs.  Consider,  for  example,  the 
rewrite  system  of  Figure  3.1,  where  the  cost  of  each  rule  is 
shown  below  it 

This  rewrite  system  contains  two  separate  sets  of 
rewrite  rules:  those  involving  "imodes”  and  those  involv¬ 
ing  "am odes”.  Now  consider  an  input  tree  of  the  form: 
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An  Unbounded  Number  of  5-LR  Graphs 
Figure  3.1 


number  of  “+”  operators  and  the  number  of  “Fetch” 
operators  in  the  tree.  Recording  this  information  requires 
an  unbounded  number  of  states. 

Fortunately,  the  above  situation  is  uncharacteristic 
of  "real”  machine  descriptions.  Real  machine  descrip¬ 
tions  have  a  symbol,  which  corresponds  to  the  notion  of  a 
“register”,  that  plays  a  central  role:  all  trees  (except 
maybe  a  few)  can  be  rewritten  into  “register”  in  a  short 
number  of  rewrites  and  "register”  can  rewritten  into  all 
trees  (except  maybe  a  few)  also  in  a  short  number  of 
rewrites.  This  provides  a  “triangular  inequality”  that 
forces  together  the  delta  costs  associated  with  the  trees  in 
the  LR  graph  and  leads  to  a  finite  number  of  5-LR  graphs. 
See  [PeI87]  for  one  possible  formalization  of  this  argu- 


Whether  “imode"  rewrites  or  “am ode"  rewrites 
are  cheaper  depends  on  the  relationship  between  the 


Testing  if  there  is  a  finite  number  of  5-LR  graphs 
can  be  done  as  part  of  the  generation  of  the  graphs.  Con¬ 
ceptually,  the  procedure  can  be  understood  as  first  generat¬ 
ing  the  LR  graphs  and  then  annotating  them  with  costs  and 
generating  more  5-LR  graphs  until  no  new  graph  is  found. 
This  procedure  will  terminate  if  there  is  a  finite  number  of 
5-LR  graphs,  but  will  fail  to  do  so  if  there  is  an  infinite 


A.*  *.' 


A  .%  A',V  tV. 


1,1  ■I.i:iliUJiU.'IiW:Ii 


number  of  them. 

If  there  is  a  finite  number  of  states,  then  it  is  possible 
to  apply  the  same  algorithm  used  for  REACHABILITY  to 
solve  c-reachabujty  very  efficiently.  Unfortunately,  a 
single  LR  graph  may  be  replaced  by  several  S-LR  graphs, 
which  may  lead  to  a  substantially  larger  number  of  states. 
The  number  of  states  needed  can  be  reduced  in  a  way 
similar  to  that  of  the  previous  section  by  using  6-UI  LR 
graphs  instead  of  6-LR  graphs.  In  addition,  one  can 
observe  that  the  costs  associated  with  the  5-LR  graphs 
(and  the  5-UI  LR  graphs)  are  not  used  in  solving  c- 
reachabhjty:  they  are  used  to  compute  the  states  but, 
after  that,  only  the  graph  structure  is  used,  without  the  cost 
information.  Since  two  different  6-(UI)  LR  graphs  may 
have  the  same  structure,  it  is  possible  that  two  different 
stales  may  be  equivalent  The  minimal  number  of  states 
needed  can  be  computed  by  a  variation  of  the  standard 
algorithm  to  minimize  a  bottom-up  tree  automaton  which 
is,  in  turn,  a  variation  of  the  minimization  of  a  finite-state 
automaton. 

4.  A  Code  Generator  Generator 

We  have  implemented  a  code  generator  generator 
that  works  by  solving  C-REACH ABILITY  using  BURS 
theory.  The  table  generator  implementation  is  stand-alone 
and  emphasizes  generating  small  tables,  with  no  great 
efTort  spent  in  trying  to  generate  them  fast  It  has  been 
running  since  late  1986.  The  implementation  is  based  on 
the  theory  presented  in  the  previous  two  sections  with 
some  modifications.  The  5-LR  graphs  are  first  generated 
using  an  extension  of  David  Chase's  algorithm  for 
bottom-up  pattern  matchers  [Cha87];  then  useless  informa¬ 
tion  is  removed  and  5-UI  LR  graphs  are  selected.  Since 
optimal  selection  of  5-UI  LR  graphs  is  difficult,  the  selec¬ 
tion  is  done  by  a  process  which  starts  from  5-LR  graphs 
and  attempts  to  make  graphs  identical  by  removing  some 
alternatives,  determining  which  nodes  in  the  graphs  are 
useless,  removing  them,  and  repeating  the  process.  In  our 
experiments,  the  number  of  states  stabilizes  in  two  or  three 
iterations. 

The  modified  version  of  Chase’s  algorithm  generates 
a  representation  of  the  bottom-up  tree  automaton  comput¬ 
ing  the  states  (representing  the  5-UI  LR  graphs)  as  a  col¬ 
lection  of  "folded"  tables,  one  for  each  n-ary  operator, 
where  identical  n-1 -hyperplanes  have  been  found  and 


shared.  Our  implementation  accepts  only  0,  1,  and  2-ary 
operators.  In  the  case  of  binary  operators,  the  representa¬ 
tion  is  a  table  where  identical  rows  and  columns  have  been 
found;  we  call  the  1 -dimensional  arrays  indicating  identi¬ 
cal  rows  and  columns,  restrictors.  We  bit-encode  the  res¬ 
trictors  and  share  them  across  different  tables. 

The  code  generator  uses  the  automaton  tables  and 
also  a  second  set  of  tables  that  encodes  each  5-UI  LR 
graph;  the  costs  are  not  stored  since  they  are  unused.  This 
second  set  of  tables  is  encoded  using  a  technique  similar  to 
that  in  YACC  [Joh78],  by  overlaying  rows  of  information. 
The  table  generator  uses  a  few  simple  heuristics  to  reduce 
the  table  size  of  these  tables,  see  [Pel87]  for  details. 

A  final  modification  from  the  theory  of  the  previous 
section  is  that  the  problem  that  the  code  generator  really 
wants  solved  is  not  C-reachaboity.  Each  rewrite  rule  of 
the  rewrite  system  given  to  the  table  generator  has,  in  addi¬ 
tion  to  a  cost,  a  call  to  a  semantic  routine.  What  the  code 
generator  uses  is  not  a  rewrite  sequence  of  minimum  cost, 
but  its  associated  sequence  of  semantic  routine  calls.  We 
call  this  problem  UCODE.  The  minimum  number  of  states 
needed  to  solve  ucode  can  be  found  using  a  minimization 
method  like  the  one  mentioned  in  the  previous  section. 

The  BURS  code  generator  has  been  operative  since 
early  1987,  integrated  into  UW-CODEGEN  [HeD87),  a 
testbed  for  table-driven  code  generators  developed  by 
Robert  Henry  at  the  University  of  Washington,  uw- 
CODEGEN  does  temporary  and  register  management  and 
includes  the  following  code  generators: 

GG  A  code  generator  based  on  Graham -Glanville  tech¬ 
nology  [G1G78]; 

BU  A  locally  optimal  code  generator  based  on  bouom- 
up  pattern  matching,  manipulating  states  similar  to 
LR  graphs  but  with  costs  represented  explicitly  and 
computed  with  a  dynamic  programming  algorithm; 
and 

TO  A  locally  optimal  code  generator  based  on  top-down 
pattern  matching  technology  and  manipulating  costs 
explicitly  with  a  dynamic  programming  algorithm. 
The  trees  in  a  state  are  listed  explicitly. 

TO  and  BU  were  implemented  by  Damron  and  Henry, 
respectively,  and  were  developed  independently  of  the 
BURS-based  code  generator  presented  in  this  paper.  The 
theory  behind  TO  is  similar  to  that  used  in  twig  [AGT86] 
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and  in  the  top-down  algorithms  of  Weisgeiber/Wilhelm 
[WeW86].  The  theory  behind  BU  is  related  to  burs  and  to 
the  bottom -up  algorithm  described  by 
Weisgerber/Wilhelm  [WeW86].  The  big  advantage  of  the 
UW -CODEGEN  testbed  is  that  it  facilitates  meaningful  code 
generator  comparisons. 

We  have  tested  the  table  constructor  with  several 
machine  descriptions  that  were  developed  at  UC  Berkeley 
as  part  of  the  codegen  effort  [AGH84].  This  paper  only 
reports  on  two  machine  descriptions  that  were  made  avail¬ 
able  by  Robert  Henry:  a  Vax-11  description  and  a 
Motorola  MC68000  description;  for  technical  reasons, 
they  are  the  only  ones  that  we  can  use  to  generate  code 
with  uw-codegen.  The  machine  descriptions  used  are 
machine  grammars  without  generic  operator  rewrite  rules. 
The  cost  assigned  to  each  rule  is  a  4-tuple  indicating  the 
numbers  of  memory  bytes  referenced,  instructions  issued, 
side  effects  issued,  and  operands  in  the  instruction  frag¬ 
ment  represented  by  the  rule.  The  tuple  leads  to  6 
“natural"  costs:  a  constant  cost,  each  of  the  4  elements 
considered  separately,  and  a  lexicographic  ordering  on  the 
full  tuple.  We  will  denote  the  6  costs  as  K,  M,  I,  S,  O,  and 
L,  respectively.  The  GG  implementation  disregards  the 
cost  information;  the  BU  and  TD  implementations  always 
use  full  lexicographic  cost  burs  currently  can  use  any  of 
the  6  costs  except  L. 

The  two  principal  measures  of  interest  are  table  size 
and  code  generation  speed.  Table  size  is  related  to  the 
number  of  states  needed  to  solve  uoode,  which  depends 
on  the  cost  function  used  and  the  method  of  state  construc¬ 
tion.  Figure  4.1  shows,  for  the  two  machine  descriptions 
mentioned,  the  number  of  5-LR  graphs  that  are  generated 
initially  and  the  final  number  of  states  needed.  The  table 
shows  the  big  variation  in  the  number  of  states  needed:  the 
constant  cost  function  (K)  requires  few  stales  while  the 
function  that  counts  the  memory  references  (M)  requires 
many.  The  lexicographic  cost  would  produce  a  larger 
number  of  states  but,  due  to  implementation  restrictions  in 
our  exploratory  implementation  of  the  table  generator,  the 
tables  cannot  be  generated.  An  approximation  to  the  lexi¬ 
cographic  cost  produces  tables  slightly  larger  than  the  larg¬ 
est  using  a  single  component  The  table  also  shows  that  in 
this  example  our  heuristic  to  reduce  the  number  of  states 
obtains  a  significant  reduction;  we  have  obtained  larger 
reductions  with  other  machine  descriptions. 
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Henry  and  Damron  report  in  detail  on  the  table  sizes 
for  GG,  BU,  and  TD  [HeD87].  Figure  42  shows  the  table 
size  for  burs  for  the  different  cost  metrics.  For  each 
machine  description  and  each  cost  function  there  are  three 
numbers,  listed  from  the  top:  the  space  used  to  represent 
the  bottom-up  tree  automaton,  the  space  used  to  represent 
the  states  themselves,  ix.  their  internal  nodes  and  edges, 
and  the  total  space.  Note  that  the  major  variation  is  in  the 
size  of  the  bottom-up  tree  automaton.  The  bottom  of  the 
figure  shows  the  influence  of  the  representation  of  the  res¬ 
trictors  on  the  table  size.  The  three  columns  indicate  the 
restrictor  size,  the  automaton  size,  and  the  total  table  size. 
Sharing  identical  restrictors  is  a  very  simple  optimization 
and  a  big  win;  bit-encoding  the  restrictors  does  not  seem  to 
significantly  slow  the  ucode  solver. 

Figure  4.3  compares  the  table  sizes  for  several  code 
generators  in  uw-codegen.  The  values  for  BU,  TD,  and  GG 
are  taken  from  [HeD87];  the  line  labelled  “states”  is  the 
space  for  the  patterns,  replacements,  costs,  and  actions;  the 
line  labelled  “fsa”  corresponds  to  different  notions  of 
automaton.  The  uw-codegen  values  are  estimated  from 
bar  charts3.  The  values  for  BURS  are  for  the  M  cost  func¬ 
tion,  which  is  the  one  requiring  the  largest  tables.  BURS-t 
and  BURS-f  represent  different  versions  of  the  table  genera¬ 
tor.  BURS-i  is  an  approximation  to  L ,  (note  that  M  is  the 
first  component),  while  BURS-f  uses  M  as  cost  function  but 
tries  to  generate  the  tables  fast  rather  than  spending  too 
much  time  generating  small  tables.  Again  there  are  three 
numbers  per  combination  of  machine  description  and  cost 
function.  They  are  the  size  of  the  automaton,  the  states. 
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and  their  sum.  Chase’s  technology  would  provide  a  sub¬ 
stantially  smaller  B-fsa  than  the  one  used  in  BU.  Accord¬ 
ing  to  Chase  [Cha87],  a  reasonable  value  is  in  the  vicinity 
of  23K;  this  would  place  the  total  table  size  very  close  to 
GG  and  BURS. 
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We  use  the  same  set  of  6  programs  used  in  [HeD87] 
to  measure  the  performance  of  the  code  generator.  These 
are  C  programs  ranging  in  size  from  100  to  1200  lines. 
Figure  4.4  shows,  for  each  target,  three  values  averaged 
over  the  6  programs:  the  time  spent  solving  UCODE  normal¬ 
ized  to  GG,  the  percentage  of  code  generation  time  spent 
solving  UCODE,  and  the  total  code  generation  time  normal¬ 
ized  to  UCODE.  (All  measurements  were  made  on  a  Vax 
8600;  only  “user”  time  is  considered). 

BURS  is  substantially  faster  than  TD  and  BU  because 
manipulating  costs  is  expensive:  they  have  to  be  com¬ 
bined,  computed,  and  compared.  It  is  more  surprising  that 
BURS  is  even  faster  than  GG.  A  careful  comparison  of  the 
respective  portions  of  code  implementing  ucode  showed 
several  causes  for  the  difference  in  speeds.  Probably  the 
biggest  contribution  lies  in  the  representation  of  the  auto¬ 
maton:  GG  uses  a  tight  encoding  and  a  cache,  which  loses 
in  speed  against  the  more  efficient  table  folding.  In  addi¬ 
tion,  GG  uses  the  normal  technique  (for  parsing  technol¬ 
ogy)  of  default  transitions,  which  is  slower  than  a  simple 
lookup.  Another  contributor  is  that  the  relationship 
between  the  parser  used  in  GG  and  the  traversal  of  the  tree 
providing  the  prefix  traversal  is  not  as  simple  as  the  tree 
traversal  used  by  burs.  Finally,  GG  stores  states  and  other 
information  in  a  stack  (the  parse  stack),  while  BURS  uses 
(pre-allocated)  slots  associated  with  the  tree;  the  stack 
requires  extra  checks  for  overflow  and  the  like,  gg  also 
uses  a  few  more  indirect  routine  calls  than  BURS.  Despite 
the  difficulty  in  comparing  the  methods  in  the  presence  of 
these  differences  in  implementation  strategy,  we  think  that 
the  evidence  shows  that  BURS  is,  at  least,  comparable  in 
speed  to  GG.  To  reduce  effects  caused  by  compilation  of 
the  algorithms,  the  values  shown  in  Figure  4.4  correspond 
to  GG  compiled  using  the  peephole  optimizer,  and  BURS 
without  it;  the  values  are  more  favorable  to  BURS  othcr- 
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The  quality  of  the  generated  code  is  measured  stati¬ 
cally  using  the  same  metric  that  we  have  discussed  earlier 
the  4-tuple  of  values.  Figure  4.5  shows  the  average  cost, 
normalized  to  100  for  bu  and  td.  bu  and  TD  have  a  small 
error  that  shows  very  infrequently  and  which  causes  some 
normalized  values  to  be  under  100.00.  The  quality  of  the 
code  generated  by  BURS-1  is  quite  close  to  the  lexico¬ 
graphic  optimum. 

The  time  spent  generating  the  BURS  tables  depends 
on  the  cost  function  and  on  the  effort  spent  trying  to  gen¬ 
erate  small  tables.  The  top  of  Figure  4.6  shows  times  in 
seconds  on  a  Sun-3/75  with  12  MB  of  main  memory  and 
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no  local  disk.  The  bottom  of  the  figure  reproduces  infor¬ 
mation  from  [HeD87]  comparing  the  performance  of  the 
different  table  generators  in  UW-CODEGEN;  values  are  in 
seconds  on  a  DEC  Microvax-11.  There  are  two  columns 
for  BU:  the  first  column  corresponds  to  the  generation  of 
tables  without  any  effort  to  use  cost  information  at  table- 
generation  time  to  reduce  the  number  of  alternatives  to 
consider  at  code  generation  time;  the  second  column 
corresponds  to  the  tables  used  in  our  other  comparisons,  in 
which  some  elimination  of  alternatives  is  done  based  on 
costs.  We  want  to  emphasize  that  the  current  implementa¬ 
tion  of  the  table  generator  for  BURS  was  written  with  no 
special  effort  to  generate  tables  fast 

5.  Other  Related  Work  and  Conclusions 

The  idea  behind  the  algorithm  for  reachability  has 
been  around  for  a  while;  maybe  the  earliest  references  are 
the  dynamic  programming  algorithms  of  [AUJ77]  and 
[Rip77],  BURS  theory  differs  from  these  early  proposals 
in  that  it  is  based  on  rewrite  systems,  it  can  handle  a  larger 
class  of  rewrite  systems,  and  it  emphasizes  the  computabil¬ 
ity  of  the  states  by  a  bottom-up  finite  state  automaton.  Our 
theory  was  developed  independently  of  the  work  of 
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Weisgerber/Wilhelm  [WcW86],  and  Henry/Damron 
[HcD87];  it  differs  from  the  work  of  those  researchers  in 
its  ability  to  encode  cost  infrxmaiion  into  the  5-BURS 
states  and  in  handling  a  larger  class  of  rewrite  systems. 
Our  work  on  optimal  code  generation  yields  results  similar 
to  those  claimed  by  Hatcher  and  Christopher  [HaC] 
[Hat85]  but  while  the  Hatcher/Christopher  technique 
requires  modifying  some  parts  of  the  machine  description 
to  retain  optimality,  the  approach  described  here  will 
always  be  optimal,  provided  that  a  finite  number  of  states 
exist.  We  suspect  that  the  Hatcher/Christopher  technique 
can  be  explained  as  a  simplification  of  BURS-theory. 

Probably  the  best-known  implementation  for  locally 
optimal  code  generation  is  the  one  used  for  twig  [AGT86], 
The  theory  behind  that  implementation  is  quite  similar  to 
the  one  used  in  TD  with  two  differences.  The  first  differ¬ 
ence  is  that  the  implementation  of  twig  reported  in 
[AGT86]  does  more  computation  at  solving  time  than  TD. 
Thus,  twig  has  smaller  tables  and  smaller  table  generation 
times,  but  larger  code  generation  times.  The  second  differ¬ 
ence  is  in  the  phase  organization.  Both  twig  and  uw- 
codegen  perform  two  types  of  transformations:  some 
transformations  are  for  normalization  and  simplification, 
like  the  mapping  of  short-circuit  booleans  into  compare 
and  jumps,  the  others  are  the  ones  discussed  in  this  paper 
and  correspond  to  the  machine  instructions.  Twig  deals 
with  both  types  of  transformations  together  in  a  single 
mechanism,  but  the  interaction  of  the  machine  rewrite 
applications  with  the  simplification  routines  allows  looping 
and  non-optimal  transformations  to  occur.  UW-CODEGEN 
first  performs  the  normalization  and  simplification  and 
then  the  machine  rewrite  applications,  but  allows  the 
simplification  routines  to  query  the  machine  description  to 
make  decisions.  The  current  implementation  of  the 
simplification  routines  in  UW-CODEGEN  is  pattern-driven 
and  a  bit  inefficient  A  new  version  recently  written  by 
Henry  [Hen87]  is  faster  and  seems  easier  to  program. 
Although  we  don’t  have  specific  measures  comparing  our 
approach  and  twig ,  it  is  safe  to  say  that  BURS-based  code 
generation  is  substantially  faster  than  one  based  on  twig. 
The  results  of  Henry/Damron  [HeD87]  also  suggest  that  if 
one  were  to  model  the  code  generation  in  a  way  similar  to 
the  one  used  in  twig,  a  bottom-up  pattern  matcher  could  be 
faster  than  the  currently  used  top-down  pattern  matcher; 
the  work  of  Chase  [Cha87]  and  our  own  shows  that  the 
space  penalty  is  manageable. 


We  have  shown  the  potential  for  BURS-based  fast 
optimal  code  generation  for  expression  trees.  The  main 
advantage  of  optimality  is  that  as  long  as  the  machine 
description  is  accurate,  there  is  no  need  for  the  machine 
description  writer  to  understand  the  theory  used  to  gen¬ 
erate  the  code  generator.  A  non-optima]  technique  like 
GG  generates  optimal  code  for  a  uniform  instruction  set 
such  as  those  found  on  RISC  machines  [Pat85].  It  can 
generate  quite  good  code  otherwise  (see  Figure  4.5)  if  the 
machine  description  is  carefully  written  [Hen84]. 

reachability  problems  can  be  used  in  several  other 
applications.  Projection  Systems  [Pel87]  are  a  descriptive 
mechanism  for  tree  transformation  that  is  similar  to  tree- 
to-tree  grammars  [KMP84],  and  can  be  used,  for  instance, 
to  describe  the  mapping  between  parse  trees  and  abstract 
syntax  trees.  Forward  and  backward  applications  of  pro¬ 
jection  systems  can  be  reduced  to  reachability  prob¬ 
lems.  X-pattems  [Pel87]  are  an  extension  of  traditional 
patterns  to  describe  non-local  conditions.  Pattern  match¬ 
ing  of  X-pattems  can  be  reduced  to  a  reachability  prob¬ 
lem. 

Our  current  research  in  the  area  includes  exploring 
faster  algorithms  for  the  table  generation,  and  testing  of 
k-BURS  for  any  recognizable  input  set.  We  are  also 
working  in  other  applications  of  reachability. 
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